AITopics | contradictory information

Collaborating Authors

contradictory information

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

BoardgameQA: A Dataset for Natural Language Reasoning with Contradictory Information

Neural Information Processing SystemsDec-26-2025, 05:05:21 GMT

Automated reasoning with unstructured natural text is a key requirement for many potential applications of NLP and for developing robust AI systems. Recently, Language Models (LMs) have demonstrated complex reasoning capacities even without any finetuning. However, existing evaluation for automated reasoning assumes access to a consistent and coherent set of information over which models reason. When reasoning in the real-world, the available information is frequently inconsistent or contradictory, and therefore models need to be equipped with a strategy to resolve such conflicts when they arise. One widely-applicable way of resolving conflicts is to impose preferences over information sources (e.g., based on source credibility or information recency) and adopt the source with higher preference. In this paper, we formulate the problem of reasoning with contradictory information guided by preferences over sources as the classical problem of defeasible reasoning, and develop a dataset called BoardgameQA for measuring the reasoning capacity of LMs in this setting. BoardgameQA also incorporates reasoning with implicit background knowledge, to better reflect reasoning problems in downstream applications. We benchmark various LMs on BoardgameQA and the results reveal a significant gap in the reasoning capacity of state-of-the-art LMs on this problem, showing that reasoning with conflicting information does not surface out-of-the-box in LMs. While performance can be improved with finetuning, it nevertheless remains poor.

boardgameqa, natural language reasoning, reasoning, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.39)

Add feedback

When Evidence Contradicts: Toward Safer Retrieval-Augmented Generation in Healthcare

Javadi, Saeedeh, Mirabi, Sara, Gangar, Manan, Ofoghi, Bahadorreza

arXiv.org Artificial IntelligenceNov-11-2025

In high-stakes information domains such as healthcare, where large language models (LLMs) can produce hallucinations or misinformation, retrieval-augmented generation (RAG) has been proposed as a mitigation strategy, grounding model outputs in external, domain-specific documents. Yet, this approach can introduce errors when source documents contain outdated or contradictory information. This work investigates the performance of five LLMs in generating RAG-based responses to medicine-related queries. Our contributions are three-fold: i) the creation of a benchmark dataset using consumer medicine information documents from the Australian Therapeutic Goods Administration (TGA), where headings are repurposed as natural language questions, ii) the retrieval of PubMed abstracts using TGA headings, stratified across multiple publication years, to enable controlled temporal evaluation of outdated evidence, and iii) a comparative analysis of the frequency and impact of outdated or contradictory content on model-generated responses, assessing how LLMs integrate and reconcile temporally inconsistent information. Our findings show that contradictions between highly similar abstracts do, in fact, degrade performance, leading to inconsistencies and reduced factual accuracy in model answers. These results highlight that retrieval similarity alone is insufficient for reliable medical RAG and underscore the need for contradiction-aware filtering strategies to ensure trustworthy responses in high-stakes domains.

information, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2511.06668

Country: Oceania > Australia (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (1.00)
Media > News (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Beyond a Million Tokens: Benchmarking and Enhancing Long-Term Memory in LLMs

Tavakoli, Mohammad, Salemi, Alireza, Ye, Carrie, Abdalla, Mohamed, Zamani, Hamed, Mitchell, J Ross

arXiv.org Artificial IntelligenceNov-3-2025

Evaluating the abilities of large language models (LLMs) for tasks that require long-term memory and thus long-context reasoning, for example in conversational settings, is hampered by the existing benchmarks, which often lack narrative coherence, cover narrow domains, and only test simple recall-oriented tasks. This paper introduces a comprehensive solution to these challenges. First, we present a novel framework for automatically generating long (up to 10M tokens), coherent, and topically diverse conversations, accompanied by probing questions targeting a wide range of memory abilities. From this, we construct BEAM, a new benchmark comprising 100 conversations and 2,000 validated questions. Second, to enhance model performance, we propose LIGHT-a framework inspired by human cognition that equips LLMs with three complementary memory systems: a long-term episodic memory, a short-term working memory, and a scratchpad for accumulating salient facts. Our experiments on BEAM reveal that even LLMs with 1M token context windows (with and without retrieval-augmentation) struggle as dialogues lengthen. In contrast, LIGHT consistently improves performance across various models, achieving an average improvement of 3.5%-12.69% over the strongest baselines, depending on the backbone LLM. An ablation study further confirms the contribution of each memory component.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.27246

Country:

North America > United States (0.45)
North America > Canada (0.45)
Asia > Middle East (0.28)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Leisure & Entertainment (1.00)
Banking & Finance > Real Estate (1.00)
Education (0.92)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

BoardgameQA: A Dataset for Natural Language Reasoning with Contradictory Information

Neural Information Processing SystemsJan-19-2025, 09:57:48 GMT

contradictory information, natural language reasoning, reasoning, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.73)

Add feedback

Automated system can rewrite outdated sentences in Wikipedia articles

#artificialintelligenceMar-7-2020, 17:17:18 GMT

A system created by MIT researchers could be used to automatically update factual inconsistencies in Wikipedia articles, reducing time and effort spent by human editors who now do the task manually. Wikipedia comprises millions of articles that are in constant need of edits to reflect new information. That can involve article expansions, major rewrites, or more routine modifications such as updating numbers, dates, names, and locations. Currently, humans across the globe volunteer their time to make these edits. In a paper being presented at the AAAI Conference on Artificial Intelligence, the researchers describe a text-generating system that pinpoints and replaces specific information in relevant Wikipedia sentences, while keeping the language similar to how humans write and edit.

information, outdated sentence, wikipedia article, (16 more...)

#artificialintelligence

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.40)

Industry: Media (0.32)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence (1.00)

Add feedback

Toward Argumentation-Based Cyber Attribution

Nunes, Eric (Arizona State University) | Shakarian, Paulo (Arizona State University) | Simari, Gerardo (Universidad Nacional del Sur)

AAAI ConferencesApr-12-2016

A major challenge in cyber-threat analysis is combining information from different sources to find the person or the group responsible for the cyber-attack. It is one of the most important technical and policy challenges in cyber-security. The lack of ground truth for an individual responsible for an attack has limited previous studies. In this paper, we overcome this limitation by building a dataset from the capture-the-flag event held at DEFCON, and propose an argumentation model based on a formal reasoning framework called DeLP (Defeasible Logic Programming) designed to aid an analyst in attributing a cyber-attack to an attacker. We build argumentation-based models from latent variables computed from the dataset to reduce the search space of culprits (attackers) that an analyst can use to identify the attacker. We show that reducing the search space in this manner significantly improves the performance of classification-based approaches to cyber-attribution.

argument, culprit, exploit 1, (17 more...)

AAAI Conferences

Workshops at the Thirtieth AAAI Conference on Artificial Intelligence

Country:

South America > Argentina (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
North America > United States > Arizona > Maricopa County > Tempe (0.04)

Genre: Research Report (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.87)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (0.94)

Add feedback